Overview

Dataset statistics

Number of variables24
Number of observations73,799
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory14.1 MiB
Average record size in memory200.0 B

Variable types

Numeric8
Categorical15
DateTime1

Warnings

sample has constant value "1" Constant
sna is highly correlated with first_timeHigh correlation
first_time is highly correlated with snaHigh correlation
car is highly correlated with car_typeHigh correlation
car_type is highly correlated with carHigh correlation
home_address_1 is highly correlated with home_address_2 and 2 other fieldsHigh correlation
home_address_2 is highly correlated with home_address_1 and 2 other fieldsHigh correlation
work_address_2 is highly correlated with home_address_1 and 2 other fieldsHigh correlation
work_address_3 is highly correlated with home_address_1 and 2 other fieldsHigh correlation
car is highly correlated with car_typeHigh correlation
car_type is highly correlated with carHigh correlation
home_address_1 is highly correlated with home_address_2 and 2 other fieldsHigh correlation
home_address_2 is highly correlated with home_address_1 and 2 other fieldsHigh correlation
work_address_2 is highly correlated with home_address_1 and 2 other fieldsHigh correlation
work_address_3 is highly correlated with home_address_1 and 2 other fieldsHigh correlation
car is highly correlated with car_typeHigh correlation
car_type is highly correlated with carHigh correlation
home_address_1 is highly correlated with home_address_2 and 2 other fieldsHigh correlation
home_address_2 is highly correlated with home_address_1 and 2 other fieldsHigh correlation
work_address_2 is highly correlated with home_address_1 and 2 other fieldsHigh correlation
work_address_3 is highly correlated with home_address_1 and 2 other fieldsHigh correlation
work_address_3 is highly correlated with work_address_2 and 3 other fieldsHigh correlation
work_address_2 is highly correlated with work_address_3 and 2 other fieldsHigh correlation
car is highly correlated with car_typeHigh correlation
home_address_1 is highly correlated with work_address_3 and 3 other fieldsHigh correlation
first_time is highly correlated with snaHigh correlation
car_type is highly correlated with carHigh correlation
sna is highly correlated with first_timeHigh correlation
work_address_1 is highly correlated with work_address_3 and 2 other fieldsHigh correlation
home_address_2 is highly correlated with work_address_3 and 3 other fieldsHigh correlation
sna is highly correlated with sampleHigh correlation
work_address_3 is highly correlated with sample and 3 other fieldsHigh correlation
sample is highly correlated with sna and 13 other fieldsHigh correlation
work_address_2 is highly correlated with work_address_3 and 3 other fieldsHigh correlation
car is highly correlated with sample and 1 other fieldsHigh correlation
work_address_1 is highly correlated with sampleHigh correlation
home_address_3 is highly correlated with sampleHigh correlation
foreign_passport is highly correlated with sampleHigh correlation
home_address_1 is highly correlated with work_address_3 and 3 other fieldsHigh correlation
home_address_2 is highly correlated with work_address_3 and 3 other fieldsHigh correlation
default is highly correlated with sampleHigh correlation
first_time is highly correlated with sampleHigh correlation
sex_male is highly correlated with sampleHigh correlation
car_type is highly correlated with sample and 1 other fieldsHigh correlation
good_work is highly correlated with sampleHigh correlation
client_id is uniformly distributed Uniform
client_id has unique values Unique
decline_app_cnt has 61214 (82.9%) zeros Zeros
bki_request_cnt has 19381 (26.3%) zeros Zeros

Reproduction

Analysis started2021-07-04 07:33:55.775719
Analysis finished2021-07-04 07:34:48.545526
Duration52.77 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

client_id
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct73799
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean55137.96509
Minimum1
Maximum110147
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-04T10:34:48.705094image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5469.9
Q127440.5
median55274
Q382758.5
95-th percentile104666.1
Maximum110147
Range110146
Interquartile range (IQR)55318

Descriptive statistics

Standard deviation31841.92117
Coefficient of variation (CV)0.5774953993
Kurtosis-1.203837426
Mean55137.96509
Median Absolute Deviation (MAD)27663
Skewness-0.003625146989
Sum4069126686
Variance1013907944
MonotonicityNot monotonic
2021-07-04T10:34:49.061703image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20491
 
< 0.1%
851871
 
< 0.1%
855071
 
< 0.1%
834581
 
< 0.1%
875521
 
< 0.1%
424941
 
< 0.1%
486371
 
< 0.1%
465881
 
< 0.1%
363471
 
< 0.1%
342981
 
< 0.1%
Other values (73789)73789
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
101
< 0.1%
111
< 0.1%
121
< 0.1%
ValueCountFrequency (%)
1101471
< 0.1%
1101461
< 0.1%
1101451
< 0.1%
1101431
< 0.1%
1101421
< 0.1%
1101411
< 0.1%
1101401
< 0.1%
1101391
< 0.1%
1101381
< 0.1%
1101371
< 0.1%

score_bki
Real number (ℝ)

Distinct69096
Distinct (%)93.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-1.904723627
Minimum-3.62458632
Maximum0.19977285
Zeros0
Zeros (%)0.0%
Negative73790
Negative (%)> 99.9%
Memory size1.1 MiB
2021-07-04T10:34:49.767694image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum-3.62458632
5-th percentile-2.695736607
Q1-2.259533835
median-1.92082293
Q3-1.56983126
95-th percentile-1.05476331
Maximum0.19977285
Range3.82435917
Interquartile range (IQR)0.689702575

Descriptive statistics

Standard deviation0.4982310572
Coefficient of variation (CV)-0.2615765616
Kurtosis-0.1478822814
Mean-1.904723627
Median Absolute Deviation (MAD)0.34447367
Skewness0.1941627832
Sum-140566.699
Variance0.2482341863
MonotonicityNot monotonic
2021-07-04T10:34:50.304790image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1.77526279352
 
0.5%
-2.22500363296
 
0.4%
-2.1042109288
 
0.4%
-2.16966378258
 
0.3%
-1.92082293185
 
0.3%
-2.02410005180
 
0.2%
-2.38726804169
 
0.2%
-1.52642194145
 
0.2%
-2.44723899142
 
0.2%
-2.35305175116
 
0.2%
Other values (69086)71668
97.1%
ValueCountFrequency (%)
-3.624586321
 
< 0.1%
-3.597980831
 
< 0.1%
-3.582586911
 
< 0.1%
-3.521305611
 
< 0.1%
-3.515561611
 
< 0.1%
-3.492686131
 
< 0.1%
-3.49161991
 
< 0.1%
-3.485642551
 
< 0.1%
-3.456086323
< 0.1%
-3.455709131
 
< 0.1%
ValueCountFrequency (%)
0.199772852
< 0.1%
0.19806991
< 0.1%
0.183612971
< 0.1%
0.168549331
< 0.1%
0.056034621
< 0.1%
0.045958761
< 0.1%
0.023908151
< 0.1%
0.02161691
< 0.1%
-0.006804771
< 0.1%
-0.012303321
< 0.1%

decline_app_cnt
Real number (ℝ≥0)

ZEROS

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2757489939
Minimum0
Maximum33
Zeros61214
Zeros (%)82.9%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-04T10:34:50.715803image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum33
Range33
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8042721438
Coefficient of variation (CV)2.916682061
Kurtosis95.63484404
Mean0.2757489939
Median Absolute Deviation (MAD)0
Skewness6.356795661
Sum20350
Variance0.6468536813
MonotonicityNot monotonic
2021-07-04T10:34:51.129229image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
061214
82.9%
18397
 
11.4%
22468
 
3.3%
3903
 
1.2%
4414
 
0.6%
5165
 
0.2%
6113
 
0.2%
748
 
0.1%
924
 
< 0.1%
819
 
< 0.1%
Other values (11)34
 
< 0.1%
ValueCountFrequency (%)
061214
82.9%
18397
 
11.4%
22468
 
3.3%
3903
 
1.2%
4414
 
0.6%
5165
 
0.2%
6113
 
0.2%
748
 
0.1%
819
 
< 0.1%
924
 
< 0.1%
ValueCountFrequency (%)
331
 
< 0.1%
241
 
< 0.1%
221
 
< 0.1%
191
 
< 0.1%
162
 
< 0.1%
151
 
< 0.1%
142
 
< 0.1%
133
 
< 0.1%
122
 
< 0.1%
119
< 0.1%

bki_request_cnt
Real number (ℝ≥0)

ZEROS

Distinct38
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.000338758
Minimum0
Maximum53
Zeros19381
Zeros (%)26.3%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-04T10:34:51.485285image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile6
Maximum53
Range53
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.252072623
Coefficient of variation (CV)1.125845617
Kurtosis22.92302298
Mean2.000338758
Median Absolute Deviation (MAD)1
Skewness3.035199187
Sum147623
Variance5.0718311
MonotonicityNot monotonic
2021-07-04T10:34:51.885024image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=38)
ValueCountFrequency (%)
019381
26.3%
118276
24.8%
213749
18.6%
39187
12.4%
45627
 
7.6%
53268
 
4.4%
61704
 
2.3%
7852
 
1.2%
8508
 
0.7%
9299
 
0.4%
Other values (28)948
 
1.3%
ValueCountFrequency (%)
019381
26.3%
118276
24.8%
213749
18.6%
39187
12.4%
45627
 
7.6%
53268
 
4.4%
61704
 
2.3%
7852
 
1.2%
8508
 
0.7%
9299
 
0.4%
ValueCountFrequency (%)
531
 
< 0.1%
461
 
< 0.1%
451
 
< 0.1%
411
 
< 0.1%
361
 
< 0.1%
341
 
< 0.1%
331
 
< 0.1%
322
< 0.1%
291
 
< 0.1%
284
< 0.1%

income
Real number (ℝ≥0)

Distinct966
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41099.77542
Minimum1000
Maximum1000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-04T10:34:52.499585image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum1000
5-th percentile10000
Q120000
median30000
Q348000
95-th percentile100000
Maximum1000000
Range999000
Interquartile range (IQR)28000

Descriptive statistics

Standard deviation46166.3224
Coefficient of variation (CV)1.123274323
Kurtosis104.0283418
Mean41099.77542
Median Absolute Deviation (MAD)12000
Skewness7.702438065
Sum3033122326
Variance2131329324
MonotonicityNot monotonic
2021-07-04T10:34:53.406024image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
300007009
 
9.5%
250006061
 
8.2%
200005471
 
7.4%
400004935
 
6.7%
500004470
 
6.1%
350004231
 
5.7%
150003949
 
5.4%
600002523
 
3.4%
450002486
 
3.4%
180001832
 
2.5%
Other values (956)30832
41.8%
ValueCountFrequency (%)
10005
< 0.1%
11001
 
< 0.1%
15002
 
< 0.1%
17001
 
< 0.1%
20001
 
< 0.1%
24002
 
< 0.1%
24501
 
< 0.1%
25001
 
< 0.1%
30004
< 0.1%
30731
 
< 0.1%
ValueCountFrequency (%)
100000010
< 0.1%
9999993
 
< 0.1%
9990002
 
< 0.1%
9900001
 
< 0.1%
9500004
 
< 0.1%
9470001
 
< 0.1%
9000006
< 0.1%
8500111
 
< 0.1%
8300001
 
< 0.1%
8000008
< 0.1%

age
Real number (ℝ≥0)

Distinct52
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.28064066
Minimum21
Maximum72
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-04T10:34:53.830431image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile24
Q130
median37
Q348
95-th percentile60
Maximum72
Range51
Interquartile range (IQR)18

Descriptive statistics

Standard deviation11.5203779
Coefficient of variation (CV)0.2932838595
Kurtosis-0.7349138354
Mean39.28064066
Median Absolute Deviation (MAD)9
Skewness0.4747453868
Sum2898872
Variance132.7191069
MonotonicityNot monotonic
2021-07-04T10:34:54.236719image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
312727
 
3.7%
282705
 
3.7%
302693
 
3.6%
292659
 
3.6%
272645
 
3.6%
262528
 
3.4%
322501
 
3.4%
342395
 
3.2%
332314
 
3.1%
352259
 
3.1%
Other values (42)48373
65.5%
ValueCountFrequency (%)
21849
 
1.2%
22937
 
1.3%
231509
2.0%
241873
2.5%
252202
3.0%
262528
3.4%
272645
3.6%
282705
3.7%
292659
3.6%
302693
3.6%
ValueCountFrequency (%)
722
 
< 0.1%
713
 
< 0.1%
7032
 
< 0.1%
6981
 
0.1%
68165
 
0.2%
67248
0.3%
66309
0.4%
65422
0.6%
64449
0.6%
63477
0.6%

region_rating
Real number (ℝ≥0)

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.672570089
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-04T10:34:54.545503image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q14
median4
Q35
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.30557172
Coefficient of variation (CV)0.2794119072
Kurtosis-0.628301709
Mean4.672570089
Median Absolute Deviation (MAD)1
Skewness0.4815443039
Sum344831
Variance1.704517516
MonotonicityNot monotonic
2021-07-04T10:34:54.749556image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
427523
37.3%
516075
21.8%
312027
16.3%
711469
15.5%
66199
 
8.4%
2300
 
0.4%
1206
 
0.3%
ValueCountFrequency (%)
1206
 
0.3%
2300
 
0.4%
312027
16.3%
427523
37.3%
516075
21.8%
66199
 
8.4%
711469
15.5%
ValueCountFrequency (%)
711469
15.5%
66199
 
8.4%
516075
21.8%
427523
37.3%
312027
16.3%
2300
 
0.4%
1206
 
0.3%

sna
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
1
47301 
4
11749 
2
10626 
3
 
4123

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters73,799
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4
2nd row4
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
147301
64.1%
411749
 
15.9%
210626
 
14.4%
34123
 
5.6%

Length

2021-07-04T10:34:55.436676image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:34:55.737959image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
147301
64.1%
411749
 
15.9%
210626
 
14.4%
34123
 
5.6%

Most occurring characters

ValueCountFrequency (%)
147301
64.1%
411749
 
15.9%
210626
 
14.4%
34123
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number73799
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
147301
64.1%
411749
 
15.9%
210626
 
14.4%
34123
 
5.6%

Most occurring scripts

ValueCountFrequency (%)
Common73799
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
147301
64.1%
411749
 
15.9%
210626
 
14.4%
34123
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII73799
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
147301
64.1%
411749
 
15.9%
210626
 
14.4%
34123
 
5.6%

first_time
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
3
31255 
4
18737 
1
12239 
2
11568 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters73,799
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row4
4th row3
5th row4

Common Values

ValueCountFrequency (%)
331255
42.4%
418737
25.4%
112239
 
16.6%
211568
 
15.7%

Length

2021-07-04T10:34:56.288702image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:34:56.459394image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
331255
42.4%
418737
25.4%
112239
 
16.6%
211568
 
15.7%

Most occurring characters

ValueCountFrequency (%)
331255
42.4%
418737
25.4%
112239
 
16.6%
211568
 
15.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number73799
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
331255
42.4%
418737
25.4%
112239
 
16.6%
211568
 
15.7%

Most occurring scripts

ValueCountFrequency (%)
Common73799
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
331255
42.4%
418737
25.4%
112239
 
16.6%
211568
 
15.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII73799
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
331255
42.4%
418737
25.4%
112239
 
16.6%
211568
 
15.7%

education
Real number (ℝ≥0)

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.823669697
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-04T10:34:56.596580image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q12
median2
Q34
95-th percentile4
Maximum6
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation0.9594364793
Coefficient of variation (CV)0.3397835378
Kurtosis-1.177319922
Mean2.823669697
Median Absolute Deviation (MAD)0
Skewness0.5168116357
Sum208384
Variance0.9205183578
MonotonicityNot monotonic
2021-07-04T10:34:56.831503image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
238860
52.7%
423365
31.7%
39816
 
13.3%
51257
 
1.7%
1307
 
0.4%
6194
 
0.3%
ValueCountFrequency (%)
1307
 
0.4%
238860
52.7%
39816
 
13.3%
423365
31.7%
51257
 
1.7%
6194
 
0.3%
ValueCountFrequency (%)
6194
 
0.3%
51257
 
1.7%
423365
31.7%
39816
 
13.3%
238860
52.7%
1307
 
0.4%
Distinct120
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
Minimum2014-01-01 00:00:00
Maximum2014-04-30 00:00:00
2021-07-04T10:34:57.241733image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:57.684534image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

good_work
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
61630 
1
12169 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters73,799
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
061630
83.5%
112169
 
16.5%

Length

2021-07-04T10:34:58.514771image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:34:58.809507image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
061630
83.5%
112169
 
16.5%

Most occurring characters

ValueCountFrequency (%)
061630
83.5%
112169
 
16.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number73799
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
061630
83.5%
112169
 
16.5%

Most occurring scripts

ValueCountFrequency (%)
Common73799
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
061630
83.5%
112169
 
16.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII73799
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
061630
83.5%
112169
 
16.5%

car
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
49832 
1
23967 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters73,799
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
049832
67.5%
123967
32.5%

Length

2021-07-04T10:34:59.377459image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:34:59.587650image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
049832
67.5%
123967
32.5%

Most occurring characters

ValueCountFrequency (%)
049832
67.5%
123967
32.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number73799
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
049832
67.5%
123967
32.5%

Most occurring scripts

ValueCountFrequency (%)
Common73799
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
049832
67.5%
123967
32.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII73799
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
049832
67.5%
123967
32.5%

car_type
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
59791 
1
14008 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters73,799
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
059791
81.0%
114008
 
19.0%

Length

2021-07-04T10:35:00.178901image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:35:00.385962image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
059791
81.0%
114008
 
19.0%

Most occurring characters

ValueCountFrequency (%)
059791
81.0%
114008
 
19.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number73799
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
059791
81.0%
114008
 
19.0%

Most occurring scripts

ValueCountFrequency (%)
Common73799
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
059791
81.0%
114008
 
19.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII73799
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
059791
81.0%
114008
 
19.0%

foreign_passport
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
62733 
1
11066 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters73,799
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
062733
85.0%
111066
 
15.0%

Length

2021-07-04T10:35:00.885511image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:35:01.130265image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
062733
85.0%
111066
 
15.0%

Most occurring characters

ValueCountFrequency (%)
062733
85.0%
111066
 
15.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number73799
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
062733
85.0%
111066
 
15.0%

Most occurring scripts

ValueCountFrequency (%)
Common73799
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
062733
85.0%
111066
 
15.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII73799
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
062733
85.0%
111066
 
15.0%

sex_male
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
41562 
1
32237 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters73,799
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
041562
56.3%
132237
43.7%

Length

2021-07-04T10:35:01.895915image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:35:02.069843image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
041562
56.3%
132237
43.7%

Most occurring characters

ValueCountFrequency (%)
041562
56.3%
132237
43.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number73799
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
041562
56.3%
132237
43.7%

Most occurring scripts

ValueCountFrequency (%)
Common73799
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
041562
56.3%
132237
43.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII73799
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
041562
56.3%
132237
43.7%

home_address_1
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0.0
41214 
1.0
32585 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters221,397
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.041214
55.8%
1.032585
44.2%

Length

2021-07-04T10:35:02.517255image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:35:02.726753image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
0.041214
55.8%
1.032585
44.2%

Most occurring characters

ValueCountFrequency (%)
0115013
51.9%
.73799
33.3%
132585
 
14.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number147598
66.7%
Other Punctuation73799
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0115013
77.9%
132585
 
22.1%
Other Punctuation
ValueCountFrequency (%)
.73799
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common221397
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0115013
51.9%
.73799
33.3%
132585
 
14.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII221397
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0115013
51.9%
.73799
33.3%
132585
 
14.7%

home_address_2
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
1.0
39956 
0.0
33843 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters221,397
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row0.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.039956
54.1%
0.033843
45.9%

Length

2021-07-04T10:35:03.288025image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:35:03.528440image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
1.039956
54.1%
0.033843
45.9%

Most occurring characters

ValueCountFrequency (%)
0107642
48.6%
.73799
33.3%
139956
 
18.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number147598
66.7%
Other Punctuation73799
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0107642
72.9%
139956
 
27.1%
Other Punctuation
ValueCountFrequency (%)
.73799
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common221397
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0107642
48.6%
.73799
33.3%
139956
 
18.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII221397
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0107642
48.6%
.73799
33.3%
139956
 
18.0%

home_address_3
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0.0
72541 
1.0
 
1258

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters221,397
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.072541
98.3%
1.01258
 
1.7%

Length

2021-07-04T10:35:04.172165image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:35:04.504207image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
0.072541
98.3%
1.01258
 
1.7%

Most occurring characters

ValueCountFrequency (%)
0146340
66.1%
.73799
33.3%
11258
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number147598
66.7%
Other Punctuation73799
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0146340
99.1%
11258
 
0.9%
Other Punctuation
ValueCountFrequency (%)
.73799
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common221397
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0146340
66.1%
.73799
33.3%
11258
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII221397
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0146340
66.1%
.73799
33.3%
11258
 
0.6%

work_address_1
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0.0
65465 
1.0
8334 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters221,397
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.065465
88.7%
1.08334
 
11.3%

Length

2021-07-04T10:35:05.039229image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:35:05.273103image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
0.065465
88.7%
1.08334
 
11.3%

Most occurring characters

ValueCountFrequency (%)
0139264
62.9%
.73799
33.3%
18334
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number147598
66.7%
Other Punctuation73799
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0139264
94.4%
18334
 
5.6%
Other Punctuation
ValueCountFrequency (%)
.73799
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common221397
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0139264
62.9%
.73799
33.3%
18334
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII221397
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0139264
62.9%
.73799
33.3%
18334
 
3.8%

work_address_2
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0.0
53293 
1.0
20506 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters221,397
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.053293
72.2%
1.020506
 
27.8%

Length

2021-07-04T10:35:05.757397image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:35:05.914096image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
0.053293
72.2%
1.020506
 
27.8%

Most occurring characters

ValueCountFrequency (%)
0127092
57.4%
.73799
33.3%
120506
 
9.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number147598
66.7%
Other Punctuation73799
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0127092
86.1%
120506
 
13.9%
Other Punctuation
ValueCountFrequency (%)
.73799
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common221397
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0127092
57.4%
.73799
33.3%
120506
 
9.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII221397
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0127092
57.4%
.73799
33.3%
120506
 
9.3%

work_address_3
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
1.0
44959 
0.0
28840 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters221,397
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row0.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.044959
60.9%
0.028840
39.1%

Length

2021-07-04T10:35:06.986400image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:35:07.446562image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
1.044959
60.9%
0.028840
39.1%

Most occurring characters

ValueCountFrequency (%)
0102639
46.4%
.73799
33.3%
144959
20.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number147598
66.7%
Other Punctuation73799
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0102639
69.5%
144959
30.5%
Other Punctuation
ValueCountFrequency (%)
.73799
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common221397
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0102639
46.4%
.73799
33.3%
144959
20.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII221397
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0102639
46.4%
.73799
33.3%
144959
20.3%

default
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
64427 
1
9372 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters73,799
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
064427
87.3%
19372
 
12.7%

Length

2021-07-04T10:35:08.083163image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:35:08.236696image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
064427
87.3%
19372
 
12.7%

Most occurring characters

ValueCountFrequency (%)
064427
87.3%
19372
 
12.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number73799
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
064427
87.3%
19372
 
12.7%

Most occurring scripts

ValueCountFrequency (%)
Common73799
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
064427
87.3%
19372
 
12.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII73799
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
064427
87.3%
19372
 
12.7%

sample
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
1
73799 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters73,799
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
173799
100.0%

Length

2021-07-04T10:35:08.698360image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-04T10:35:08.900496image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
173799
100.0%

Most occurring characters

ValueCountFrequency (%)
173799
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number73799
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
173799
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common73799
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
173799
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII73799
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
173799
100.0%

Interactions

2021-07-04T10:34:28.658273image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:28.914137image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:29.083543image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:29.317764image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:29.591927image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:29.831182image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:30.023589image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:30.232642image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:30.431276image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:30.600505image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:30.830232image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:31.147132image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:31.612077image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:31.868296image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:32.087376image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:32.351097image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:32.570176image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:32.782167image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:32.987846image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:33.186399image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:33.393634image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:33.595514image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:33.778941image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:33.985770image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:34.604047image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:34.919634image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:35.256389image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:35.528914image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:35.747335image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:36.140362image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:36.479348image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:36.681932image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:37.036381image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:37.605005image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:37.940176image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:38.174949image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:38.378000image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:38.680304image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:39.023433image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:39.405503image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:39.661471image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:39.843204image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:40.251055image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:40.832015image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:41.188478image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:41.580147image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:41.733664image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:41.894017image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:42.115926image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:42.395839image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:42.612469image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:42.850078image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:43.252379image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:43.743449image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:43.949711image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:44.123931image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:44.286818image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:44.480320image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:44.655812image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:44.830501image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:44.998848image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:45.177126image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:45.367095image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-07-04T10:34:45.551497image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Correlations

2021-07-04T10:35:09.271457image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-07-04T10:35:10.251253image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-07-04T10:35:11.194687image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-07-04T10:35:11.756366image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-07-04T10:35:12.474967image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-07-04T10:34:46.436791image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
A simple visualization of nullity by column.
2021-07-04T10:34:47.753168image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

client_idscore_bkidecline_app_cntbki_request_cntincomeageregion_ratingsnafirst_timeeducationapp_dategood_workcarcar_typeforeign_passportsex_malehome_address_1home_address_2home_address_3work_address_1work_address_2work_address_3defaultsample
025905-2.00875301180006244122014-02-01011011.00.00.00.01.00.001
163161-1.53227603190005944122014-03-12000000.01.00.00.00.01.001
225887-1.40814221300002571422014-02-01010111.00.00.00.01.00.001
316222-2.05747102100005341322014-01-23000000.01.00.00.00.01.001
4101655-1.24472301300004851442014-04-18100110.01.00.00.00.01.001
541415-2.03225700150002742322014-02-18110011.00.00.01.00.00.001
628436-2.22500400280003951122014-02-04000011.00.00.00.01.00.001
768769-1.52273901450003943322014-03-17000000.01.00.00.00.01.001
838424-1.67606110300005041422014-02-14010001.00.00.01.00.00.001
94496-2.69517601240005441332014-01-10000000.01.00.00.00.01.001

Last rows

client_idscore_bkidecline_app_cntbki_request_cntincomeageregion_ratingsnafirst_timeeducationapp_dategood_workcarcar_typeforeign_passportsex_malehome_address_1home_address_2home_address_3work_address_1work_address_2work_address_3defaultsample
7378944132-1.66267400250004044332014-02-20000010.01.00.00.01.00.001
7379087499-1.601775011006002451442014-04-03011011.00.00.01.00.00.001
7379137195-1.07749202900005461322014-02-13010001.00.00.00.01.00.001
7379282387-2.15753001450003771232014-03-30010100.01.00.00.00.01.001
737936266-1.47089100350004871422014-01-13000100.01.00.00.00.01.001
7379454887-1.79206403170004544442014-03-04000001.00.00.00.01.00.001
7379576821-2.05802901700004141422014-03-24011010.01.00.00.01.00.001
73796103695-1.51263504450003172222014-04-22000010.01.00.00.00.01.001
73797861-1.47933403130002942322014-01-04100001.00.00.00.00.01.011
7379815796-1.76471102250003441342014-01-23000010.01.00.00.00.01.001